DeepSeek-R1 Distilled

mentions 1 type Person feed RSS

// recent coverage 1 mentions

18:22

2026-05-16

research.nvidia.com

large-language-models

iGRPO: Self-Feedback-Driven LLM Reasoning

Researchers introduced Iterative Group Relative Policy Optimization (iGRPO), a two-stage reinforcement learning method that improves large language model reasoning by having the model generate and ref…

// co-occurs with top 7 entities

GRPO 1 iGRPO 1 Nemotron-H-8B-Base-8K 1 OpenReasoning-Nemotron-7B 1 AceReason-Math 1 AIME24 1 AIME25 1